Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Task scheduling strategy based on data stream classification in Heron
ZHANG Yitian, YU Jiong, LU Liang, LI Ziyang
Journal of Computer Applications    2019, 39 (4): 1106-1116.   DOI: 10.11772/j.issn.1001-9081.2018081848
Abstract462)      PDF (1855KB)(330)       Save
In a new platform for big data stream processing called Heron, the round-robin scheduling algorithm is usually used for task scheduling by default, which does not consider the topology runtime state and the impact of different communication modes among task instances on Heron's performance. To solve this problem, a task scheduling strategy based on Data Stream Classification in Heron (DSC-Heron) was proposed, including data stream classification algorithm, data stream cluster allocation algorithm and data stream classification scheduling algorithm. Firstly, the instance allocation model of Heron was established to clarify the difference in communication overhead among different communication modes of the task instances. Secondly, the data stream was classified according to the real-time data stream size between task instances based on the data stream classification model of Heron. Finally, the packing plan of Heron was constructed by using the interrelated high-frequency data streams as the basic scheduling unit to complete the scheduling to minimize the communication cost by transforming inter-node data streams into intra-node ones as many as possible. After running SentenceWordCount, WordCount and FileWordCount topologies in a Heron cluster environment with 9 nodes, the results show that compared with the Heron default scheduling strategy, DSC-Heron has 8.35%, 7.07% and 6.83% improvements in system complete latency, inter-node communication overhead and system throughput respectively; in the load balancing aspect, the standard deviations of CPU usage and memory usage of the working nodes are decreased by 41.44% and 41.23% respectively. All experimental results show that DSC-Heron can effectively improve the performance of the topologies, and has the most significant optimization effect on FileWordCount topology which is close to the real application scenario.
Reference | Related Articles | Metrics
Dynamic task dispatching strategy for stream processing based on flow network
LI Ziyang, YU Jiong, BIAN Chen, LU Liang, PU Yonglin
Journal of Computer Applications    2018, 38 (9): 2560-2567.   DOI: 10.11772/j.issn.1001-9081.2017122910
Abstract1189)      PDF (1352KB)(416)       Save
Concerning the problem that sharp increase of data input rate leads to the rising of computing latency which influences the real-time of computing in big data stream processing platform, a dynamic dispatching strategy based on flow network was proposed and applied to a data stream processing platform named Apache Flink. Firstly, a Directed Acyclic Graph (DAG) was transformed to a flow network by defining the capacity and flow of every edge and a capacity detection algorithm was used to ascertain the capacity value of every edge. Secondly, a maximum flow algorithm was used to acquire the improved network and the optimization path in order to promote the throughput of cluster when the data input rate is increasing; meanwhile the feasibility of the algorithm was proved by evaluating its time-space complexity. Finally, the influence of an important parameter on the algorithm execution was discussed and recommended parameter values of different types of jobs were obtained by experiments. The experimental results show that the throughput promoting rate of the strategy is higher than 16.12% during the increasing phases of the data input rate in different types of benchmarks compared with the original dispatching strategy of Apache Flink, so the dynamic dispatching strategy efficiently promotes the throughput of cluster under the premise of task latency constraint.
Reference | Related Articles | Metrics
Task scheduling strategy based on topology structure in Storm
LIU Su, YU Jiong, LU Liang, LI Ziyang
Journal of Computer Applications    2018, 38 (12): 3481-3489.   DOI: 10.11772/j.issn.1001-9081.2018040741
Abstract837)      PDF (1471KB)(392)       Save
In order to solve the problems of large communication cost and unbalanced load in the default round-robin scheduling strategy of Storm stream computing platform, a Task Scheduling Strategy based on Topology Structure (TS 2) in Storm was proposed. Firstly, the work nodes with sufficient and available Central Processing Unit (CPU) resources were selected and only a process was allocated to each work node to eliminate the communication cost between processes within the nodes and optimize the process deployment. Then, the topology structure was analyzed, the component with the biggest degree in the topology was found and the thread of the component was assigned with the highest priority. Finally, under the condition of the maximum number of threads that a node could carry, the associated tasks were deployed to the same node as far as possible to reduce the communication cost between nodes, improve the load balance of cluster and optimize the thread deployment. The experimental results show that, in terms of system latency, the average optimization rate of TS 2 is 16.91% and 5.69% respectively compared with Storm default scheduling strategy and offline scheduling strategy, which effectively improves the real-time performance of system. Additionally, compared with the Storm default scheduling strategy, the communication cost between nodes of TS 2 is reduced by 15.75% and its average throughput is improved by 14.21%.
Reference | Related Articles | Metrics
Dynamic data stream load balancing strategy based on load awareness
LI Ziyang, YU Jiong, BIAN Chen, WANG Yuefei, LU Liang
Journal of Computer Applications    2017, 37 (10): 2760-2766.   DOI: 10.11772/j.issn.1001-9081.2017.10.2760
Abstract759)      PDF (1299KB)(853)       Save
Concerning the problem of unbalanced load and incomplete comprehensive evaluation of nodes in big data stream processing platform, a dynamic load balancing strategy based on load awareness algorithm was proposed and applied to a data stream processing platform named Apache Flink. Firstly, the computational delay time of the nodes was obtained by using the depth-first search algorithm for the Directed Acyclic Graph (DAG) and regarded as the basis for evaluating the performance of the nodes, and the load balancing strategy was created. Secondly, the load migration technology for data stream was implemented based on the data block management strategy, and both the global and local load optimization was implemented through feedback. Finally, the feasibility of the algorithm was proved by evaluating its time-space complexity, meanwhile the influence of important parameters on the algorithm execution was discussed. The experimental results show that the proposed algorithm increases the efficiency of the task execution by optimizing the load sharing between nodes, and the task execution time is shortened by 6.51% averagely compared with the traditional load balancing strategy of Apache Flink.
Reference | Related Articles | Metrics